View Validation: A Case Study for Wrapper Induction and Text Classification
نویسندگان
چکیده
Wrapper induction algorithms, which use labeled examples to learn extraction rules, are a crucial component of information agents that integrate semi-structured information sources. Multi-view wrapper induction algorithms reduce the amount of training data by exploiting several types of rules (i.e., views), each of which being sufficient to extract the relevant data. All multiview algorithms rely on the assumption that the views are sufficiently compatible for multi-view learning (i.e., most examples are labeled identically in all views). In practice, it is unclear whether or not two views are sufficiently compatible for solving a new, unseen learning task. In order to cope with this problem, we introduce a view validation algorithm: given a learning task, the algorithm predicts whether or not the views are sufficiently compatible for solving that particular task. We use information acquired while solving several exemplar learning tasks to train a classifier that discriminates between the tasks for which the views are sufficiently and insufficiently compatible for multi-view learning. For both wrapper induction and text classification, view validation requires only a modest amount of training data to make high accuracy predictions.
منابع مشابه
Adaptive View Validation: A First Step Towards Automatic View Detection
Multi-view algorithms reduce the amount of required training data by partitioning the domain features into separate subsets or views that are sufficient to learn the target concept. Such algorithms rely on the assumption that the views are sufficiently compatible for multi-view learning (i.e., most examples are labeled identically in all views). In practice, it is unclear whether or not two vie...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملActive Learning with Strong and Weak Views: A Case Study on Wrapper Induction
Multi-view learners reduce the need for labeled data by exploiting disjoint sub-sets of features (views), each of which is sufficient for learning. Such algorithms assume that each view is a strong view (i.e., perfect learning is possible in each view). We extend the multi-view framework by introducing a novel algorithm, Aggressive Co-Testing, that exploits both strong and weak views; in a weak...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کامل